rm(list =ls())set.seed(321) # For reproducibility# Loading packages -------------------------------------------------------------library(dplyr) # Data manipulationlibrary(tidyr) # Data pivotinglibrary(ggplot2) # Visualisationlibrary(furrr) # Parallel processing library(kableExtra) # Creating tableslibrary(rchess) # Working with chess objectslibrary(ggeasy) # Makes theming plots easier# Reading in data --------------------------------------------------------------data <- tidytuesdayR::tt_load(2024, week =40)data <- data$chess
This document analyzes a dataset of chess games from Lichess provided by #TidyTuesday. It contains over 20,000 games, including information such as player ratings, move sequences, and other metrics. The main focus of this analysis will be on the specific moves played during each game.
Chess Data
The dataset includes the move sequences made by each player, represented as a string of moves. Below is a preview of the first two rows of the dataset.
Each row displays the move sequence for a game, starting with White’s move. For example, the Slav Defense (1. d4 d5 2. c4 c6) or the Nimzowitsch Defense (1. d4 Nc6 2. e4 e5). Chess moves are typically recorded using Portable Game Notation (PGN), which makes it easy to replicate games.
PGN Conversion
To facilitate further analysis, we will convert the raw move strings into the PGN format, using a custom convert_to_pgn() function that I wrote.
Show the code
# Convert moves into PGN format ------------------------------------------------convert_to_pgn <-function(moves, game_id) {# Split the moves string into a list of individual moves move_list <-strsplit(moves, " ")[[1]]# Initialize an empty string for the PGN format pgn <-""# Loop through the moves two at a time (each move is a pair: white and black)for (i inseq(1, length(move_list), by =2)) { move_number <- (i +1) /2# Move number calculation# Add both white's and black's moves, if availableif (i <length(move_list)) { pgn <-paste0(pgn, move_number, ". ", move_list[i], " ", move_list[i+1], " ") } else { # In case the game ends on white's move (no black move) pgn <-paste0(pgn, move_number, ". ", move_list[i]) } }# Returning pgn stringreturn(pgn)}# Converting moves into pgn formatchess_games <- data %>%select(game_id, moves) %>%mutate(game_id =seq(1:nrow(data)),moves =mapply(convert_to_pgn, moves, game_id))# Displaying converted datasetchess_games %>%rename(ID = game_id, Moves = moves) %>%head(n =2) %>%kbl(align ="c") %>%kable_styling(full_width =FALSE,bootstrap_options =c("striped", "hover", "condensed", "responsive"))
Now that out data is in PGN format we can use the history_detail() from the rhcess package to extract the game history. However, as our dataset contains over 20,000 rows this will take some time. Hence, we will implement parallel processing using the furrr package.
# Creating a chess board -------------------------------------------------------board <- rchess:::.chessboarddata() %>%tibble() %>%select(cell, col, row, x, y, cc)board %>%ggplot() +geom_tile(aes(x, y, fill = cc)) +scale_fill_manual(values =c("burlywood3", "burlywood4")) +# Traditional board colourstheme_void() +theme(axis.text =element_blank(), axis.ticks =element_blank()) +easy_remove_legend()
Figure 1: Chess Board
Source Code
---title: "Visualising positional moves in chess"description: "Analysis of #TidyTuesday's chess dataset"date: "2024-09-28"toc: trueformat: html: page-layout: full html-math-method: katex code-tools: true self-contained: true code-fold: true code-summary: "Show the code"categories: - TidyTuesday - data visualisationdraft: true---```{r setting-up, output = FALSE}rm(list = ls())set.seed(321) # For reproducibility# Loading packages -------------------------------------------------------------library(dplyr) # Data manipulationlibrary(tidyr) # Data pivotinglibrary(ggplot2) # Visualisationlibrary(furrr) # Parallel processing library(kableExtra) # Creating tableslibrary(rchess) # Working with chess objectslibrary(ggeasy) # Makes theming plots easier# Reading in data --------------------------------------------------------------data <- tidytuesdayR::tt_load(2024, week = 40)data <- data$chess```This document analyzes a dataset of chess games from [Lichess](https://lichess.org/) provided by [#TidyTuesday](https://github.com/rfordatascience/tidytuesday). It contains over 20,000 games, including information such as player ratings, move sequences, and other metrics. The main focus of this analysis will be on the **specific moves played during each game**.# Chess DataThe dataset includes the move sequences made by each player, represented as a string of moves. Below is a preview of the first two rows of the dataset.```{r data-moves}data %>% select(game_id, moves) %>% mutate(game_id = seq(1:nrow(data))) %>% rename(ID = game_id, Moves = moves) %>% head(n = 2) %>% kbl(align = "c") %>% kable_styling( full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed", "responsive"))```Each row displays the move sequence for a game, starting with White's move. For example, the **Slav Defense** (1. d4 d5 2. c4 c6) or the **Nimzowitsch Defense** (1. d4 Nc6 2. e4 e5). Chess moves are typically recorded using [Portable Game Notation (PGN)](https://www.chess.com/terms/chess-pgn), which makes it easy to replicate games.## PGN ConversionTo facilitate further analysis, we will convert the raw move strings into the PGN format, using a custom `convert_to_pgn()` function that I wrote.```{r convert-to-pgn}# Convert moves into PGN format ------------------------------------------------convert_to_pgn <- function(moves, game_id) { # Split the moves string into a list of individual moves move_list <- strsplit(moves, " ")[[1]] # Initialize an empty string for the PGN format pgn <- "" # Loop through the moves two at a time (each move is a pair: white and black) for (i in seq(1, length(move_list), by = 2)) { move_number <- (i + 1) / 2 # Move number calculation # Add both white's and black's moves, if available if (i < length(move_list)) { pgn <- paste0(pgn, move_number, ". ", move_list[i], " ", move_list[i+1], " ") } else { # In case the game ends on white's move (no black move) pgn <- paste0(pgn, move_number, ". ", move_list[i]) } } # Returning pgn string return(pgn)}# Converting moves into pgn formatchess_games <- data %>% select(game_id, moves) %>% mutate( game_id = seq(1:nrow(data)), moves = mapply(convert_to_pgn, moves, game_id))# Displaying converted datasetchess_games %>% rename(ID = game_id, Moves = moves) %>% head(n = 2) %>% kbl(align = "c") %>% kable_styling( full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed", "responsive"))```Now that out data is in PGN format we can use the ``history_detail()`` from the ``rhcess`` package to extract the game history. However, as our dataset contains over 20,000 rows this will take some time. Hence, we will implement parallel processing using the ``furrr`` package.```{r game-history}# Function to extract game history# process_moves <- function(p) {# chss <- Chess$new()# chss$load_pgn(p)# chss$history_detail()# }# # # Converting pgn format to game history# chess_games <- chess_games %>%# mutate(data = future_map(moves, process_moves)) %>%# select(-moves) %>% # tidyr::unnest(cols = c(data))# # # Displaying converted dataset# chess_games %>%# rename(ID = game_id) %>%# kbl(align = "c") %>%# kable_styling(# full_width = FALSE,# bootstrap_options = c("striped", "hover", "condensed", "responsive"))```# Creating a chess board```{r chess-board, warning=FALSE}#| fig-width: 12#| label: fig-chess-board#| fig-cap: Chess Board# Creating a chess board -------------------------------------------------------board <- rchess:::.chessboarddata() %>% tibble() %>% select(cell, col, row, x, y, cc)board %>% ggplot() + geom_tile(aes(x, y, fill = cc)) + scale_fill_manual(values = c("burlywood3", "burlywood4")) + # Traditional board colours theme_void() + theme(axis.text = element_blank(), axis.ticks = element_blank()) + easy_remove_legend()```